Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

基于Attention的RNN模型在联合意图识别(ID)和槽位填充(SF)上实现最好性能(其ID和SF的attention权重独立)。本文提出slot gate结构,其关注于学习intent和slot attention向量之间的关系,通过全局优化获得更好的semantic frame。通过在ATIS和Snips数据集实验,相比于attention模型semantic frame准确率提升了4.2%。

paper link
code link

Introduction

slot-filling, intent detection示例:

Figure  1:  An  example  utterance  with  annotations  of  semantic  slots  in  IOB  format  (S)  and  intent  (I),  B-dir  and I-dir  denote  the  director  name

当前最佳模型是用attention+rnn对ID和SF联合建模,但是这种方法只是通过一个共同的loss函数 $loss_{total} = loss_{ID}+loss_{SF}$ 隐式地将二者建立关联。本文提出一种slot-gated mechanism来显式建立联系。

本文的主要贡献在于:

1. the  proposed  slot-gated  approach  achieves  better  performance  than the  attention-based  models; 
2. the  experiments  on two  SLU  datasets  show  the  generalization  and  the effectiveness  of  the  proposed  slot  gate;  
3. the  gating  results  help  us  analyze  the  slot-intent  relations.

Slot-Gated Model

本文提出的模型结构如下:

Figure  2 :  The  architecture  of  the  proposed  slot-gated  models; left : Slot-Gated  Model  with  Full  Attention; right: Slot-Gated  Model  with  Intent  Attention

Attention-Based RNN Model

Figure 2中的BILSTM输入为word sequence $\mathbf{x}=(x_{1},…,x_{T})$ ,生成前向隐层状态$\underset{h_{i}}{\rightarrow}$和反向隐层状态$\underset{h_{i}}{\leftarrow}$ ,最终将二者拼接得到$h_{i}=[\underset{h_{i}}{\rightarrow};\underset{h_{i}}{\leftarrow}]$。

Slot Filling:

SF任务是将输入$\mathbf{x}=(x_{1},…,x_{T})$映射成输出$\mathbf{y}=(y_{1}^{S},…,y_{T}^{S})$。对于每个步长的输入word对应的$h_{i}$,首先计算slot context vector $c_{i}^{S}$ (实际上是self-attention,对应Figure 2中的slot attention):

$\alpha_{i,j}^{S}$ 是attention score:

然后使用$h_{i}$ 和$c_{i}^{S}$ 做分类得到第i个word对应的slot label $y_{i}^{S}$:

Intent Prediction

intent context vector $c^{I}$ 的计算方式类似于 $c_{i}^{S}$ ,区别在于预测意图时只使用BILSTM最后一个隐层状态$h_{T}$:

Slot-Gated Mechanism

slot-gated的主要目的是使用intent context vector来改善slot-filling的表现,结构如下:

Figure  3:  Illustration  of  the  slot  gate

where v and W are trainable vector and matrix respectively. The summation is done over elements in one time step.

为了比较slot gate的效果,本文还提出了一个去掉slot attention的结构,见Figure 2 右图。

Joint Optimization

模型的联合目标函数为:

$$p(y^{S},y^{I}|\mathbf{x})\\=p(y^{I}|\mathbf{x})\prod_{t=1}^{T}p(y^{S}_{t}|\mathbf{x})\\=p(y^{I}|x_{1},…,x_{T})\prod_{t=1}^{T}p(y^{S}_{t}|x_{1},…,x_{T})$$

其中,$p(y^{S},y^{I}|\mathbf{x})$ 是 SF和ID的联合条件概率。

Experiment

Dataset

本文采取的实验数据集为ATIS (Airline Travel Information Systems) 和Snips

Compared to single-domain ATIS dataset, Snips is more complicated mainly due to the intent diversity and large vocabulary.

Table  2 :  Intents  and  examples  in  Snips  dataset.

Results and Analysis

Table  3 :  SLU  performance  on  ATIS  and  Snips  datasets  (%). † indicates  the  significant  improvement over all baselines  (p <0.05).

根据Table 3,两种slot-gated模型的性能均优于baselines,但是在ATIS数据集上intent attention最优,在Snips上full attention最优。

Considering different complexity of these datasets, the probable reason is that a simpler SLU task, such as ATIS, does not require additional slot attention to achieve good results, and the slot gate is capable of providing enough cues for slot filling. On the other hand, Snips is more complex, so that the slot attention is needed in order to model slot filling better (as well as the semantic frame results).

作者特意强调slot-gate模型在frame acc上的改善,因为frame acc是同时衡量两个任务的指标。

It may credit to the proposed slot gate that learns the slot-intent relations to provide helpful information for global optimization of the joint model.

Conclusion

This paper focuses on learning the explicit slot-intent relations by introducing a slot-gated mechanism into the state-of-the-art attention model, which allows the slot filling can be conditioned on the learned intent result in order to achieve better SLU (joint slot filling and intent detection).